Author: Bertrand Guiheneuf <Bertrand.Guiheneuf@aful.org>
Date: August 9th 1999
Last revision date : September 3rd 1999
Version: 0.2

The last version of this document is always available in gnome CVS in
the gnome-mailer module: devel-docs/misc/ref_and_id_proposition.txt



A) Identifying messages within folders
--------------------------------------

Currently, in Camel there is only one way to retrieve a message from a
mail store:
     CamelMimeMessage * 
     get_message (CamelFolder *folder, gint number)

where number is an integer representing the message rank within its
parent folder.

This is a traditional method (JavaMail, MAPI) and it is very useful
because this is often the only way to get a message in from a
classical store (pop3 for example).

Moreover, various documents ([1], [2]) proposed to generalize the URL
scheme used in Camel ([3]) to access mail stores in order to identify
messages. Such an URL would be, for instance:

pop3://po.myisp.com:1

Meaning: "Access message 1 on Pop3 server po.myisp.com"

 
However, referencing a message with its number within a folder is a
very unreliable method:

1) Message order in a folder can change during a session: 
   
   The user can move or remove messages from the folder, thus
   completely changing message numbers.  We could however imagine to
   follow message operations in order to keep camel in a coherent
   state at each time instant. This could be quite complex but may
   be feasible using gtk signal system.

2) Message order can change between sessions:

   Gnome-mailer was designed from the begining to allow messages to be
   stored in classical mailboxes (mbox, maildir, MH, IMAP ...), in
   order to allow users to run other MUA on their mailboxes if
   necessary.  These other MUA can change message order within folders
   without any chance for Camel to trace the operations.
   
These two scenarii show that it is quite impossible to use reliable
folder caching or message referencing if messages are referenced only
by their position within their parent folder.
   

We thus have to find a general way to identify and retrieve a message
within its folder. One thing is sure, however: all folders
implementation won't allow this method. Pop3 stores will always access
messages using their rank on the server. MUA using Camel will thus
have to be prepared to access some stores providing only the old
fashionned message number access method.

Basically, we have two choices:

1) Accessing messages using (mailbox) Unique ID (UID)

   A UID is a string identifier associated to a message, which is
   guaranteed to be unique within its parent folder and which will not
   change between sessions.

2) Accessing messages using Message ID

   A Message ID is a string identifier associated to a messages which
   is guaranteed to be unique in the world, that is, no other message
   can have the same Message ID. The message ID is defined in RFC 822,
   and is stored as the message header "Message-id"
   
Method (1) already exists in IMAP.  
It is quite simple to define on local stores (MH, mbox, ....) but it
may not resist to message modification by other MUA.
Methods based on Message-id matching or message content checksum seem
to be the best one. Using an "X-" header is another possibility for
non read-only folders. A combination of these three methods may be the
most reliable solution.
The UID is impossible to implement in a POP3 store provider.

(2) Can be used with IMAP, but would be very ineficient. 
The main issue with this method is its dependancy upon other MUAs and
MTAs. Message-id is set before or during message transport. Moreover,
some rfc822 compliant messages may not even have any Message-id
header.
These are major issues when accessing read-only stores.  
The M-ID is also impossible to implement in a POP3 store provider.
   

We may not rely on external MUA and MTA to guarentee the uniqueness of
the identifier . We may loose messages by never being able to read them
if two had the same uid. It would be possible to find workarounds, but
it could make Camel use a bit tricky.

Given that most users will use IMAP or a database based store as their
main mail store, and given that this stores allow UID very
easily, I suggest that we use method (1). Discussion is still open,
though.

Here are the public methods I propose to add to CamelFolder:

gboolean camel_folder_supports_uid (CamelFolder *folder)
    returns true if the folder can get messages 
    by their uid.

gchar * camel_folder_get_uid_by_number (CamelFolder *folder, gint message_number)
    return the uid of message which number in the folder
    is %message_number.

gchar * camel_folder_get_message_uid (CamelFolder *folder, CamelMimeMessage *message)
    return the uid of the message within the folder.

CamelMimeMessage *camel_folder_get_message_by_uid  (CamelFolder *folder, gchar *uid)
    return the message which uid is %uid

In addition, the CamelMessage Class will have a new public method

gchar * camel_mime_message_get_uid (CamelMimeMessage *message) 
    return the uid associated to the message in its physical parent
    folder.



B) Handling message references in (v)folders.
---------------------------------------------
 

We want the future Gnome mailer to be able to build (virtual) folders
holding references to messages physically located in other
folders. More generally, we would like folders to be able to hold:

1) messages
2) subfolders
3) references to messages

(1) and (2) are already implemented in Camel because most mail stores
can hold messages and/or subfolders.

(3) is a different issue, because no existing mail store can currently
hold, within folders, references to messages in other folders.
It will thus be a specific gnome-mailer extension. 


One of the main issue is to determine what kind of behaviour we expect
from folders holding references. Here is a possible API.

( the world (v)folder is used to distinguish between the physical
parent folder and the folder holding a reference to the message, when
a confusion may arise)

Addition to CamelFolder:

gboolean camel_folder_can_hold_references (CamelFolder *folder)
    return true if the folder can contain references

void camel_folder_add_reference_by_uid (CamelFolder *folder, gchar *folder_url, gchar *message_uid)
    add a reference into a folder. %folder_url is the url of 
    the folder, %message_uid is the uid of the message within 
    its physical parent folder.
 
void camel_folder_add_reference_by_message (CamelFolder *folder, CamelMessage *message)
    add a reference. The place where the reference points 
    to is found using CamelMessage methods

void camel_folder_remove_reference_by_uid (CamelFolder *folder, gchar *uid)
    remove a message reference form a folder. Reference 
    is identified using its uid within the folder.

gboolean camel_folder_uid_is_reference (CamelFolder *folder, gchar *uid)
    return true if the message corresponding to the uid is a reference.

Then all usual operations on the folder act if the message was
actually physically stored in this folder. For example, when the mailer
uses camel_folder_get_message_by_uid onto the (v)folder, the actual
message is retrieved from its physical store.

As you can see, the uid of the message within its physical parent
folder is different than its uid within the (v)folder. This is because
there is no way to guarantee that the uids of two messages in two
different folders would be different. Using references on this two
message in the same vfolder would break uniqueness of the uid in the
(v)folder.

A couple of other methods could be defined but all the basics are
described here.  

This draft API is far from complete nor perfect, and is described here
only to stimulate discussions before the actual implementation.


The question now is to know how we store references. There are basically 
two ways:

1) references are stored using the URL of the physical folder
   and the uid of the message within the folder

2) a list of reference is kept, and in this list, reference are stored
as in (1). Folders would refer to the actual message using index in
the list


   
The main problem with (1) is that references get lost as soon
as the actual message is moved. There is no way to find in which
folders references to the message exist. 

(2) is a way to solve this issue. When messages are used, Camel looks
in the list to see if the message is refered somewhere, and actualize 
the URL and the uid with their new values. 

The problem with (2) is that we need to keep this information in a file
and libraries writing automatically to files are generally a bad idea.

As in additional remark, it is clear that Camel will only be able to
hold references to messages on stores supporting UIDs.


Thanks in advance for your comments and ideas,


       Bertrand <Bertrand.Guiheneuf@aful.org>


--

[1] : http://www.selequa.com/%7epurp/gnomail/mail2db.html 
[2] : http://www.selequa.com/%7epurp/gnomail/dbRecFmt.html 
[3] : http://www.gnome.org/mailing-lists/archives/gnome-mailer-list/1999-April/0248.shtml
